European Web Retrieval Experiments at WebCLEF 2006
نویسنده
چکیده
Hummingbird participated in the WebCLEF mixed monolingual retrieval task of the Cross-Language Evaluation Forum (CLEF) 2006. In this task, the system was given 1939 known-item queries, and the goal was to find the desired page in the 82GB EuroGOV collection (3.4 million pages crawled from government sites of 27 European domains). The 1939 queries included 124 new manually-created queries, 195 manuallycreated queries from last year, and 1620 automatically-generated queries. In our experiments, the results on the automatically-generated queries were not always predictive of the results on the manually-created queries; in particular, our title-weighting and duplicate-filtering techniques were fairly effective on the manually-created queries but were detrimental on the automatically-generated queries.
منابع مشابه
Index Combinations and Query Reformulations for Mixed Monolingual Web Retrieval
We examine the effectiveness on the multilingual WebCLEF 2006 test set of light-weight methods that have proved successful in other web retrieval settings: combinations of document representations on the one hand and query reformulation techniques on the other. We investigate a range of approaches to crosslingual web retrieval using the test suite of the mixed monolingual CLEF 2006 WebCLEF trac...
متن کاملMultilingual Web Retrieval Experiments with Field Specific Indexing Strategies for CLEF 2006 at the University of Hildesheim
For WebCLEF 2006 we experimented with the analysis and extraction of the HTML structure of the web documents. In addition, blind relevance feedback was applied in the search process. As in 2005, the experiments were carried out with a language independent indexing strategy. We experimented with HTML title, H1 element and other elements emphasizing text. Our index contained title and H1, emphasi...
متن کاملOverview of WebCLEF 2006
We report on the CLEF 2006 WebCLEF track devoted to crosslingual web retrieval. We provide details about the retrieval tasks, the used topic set, and the results of WebCLEF participants. WebCLEF 2006 used a stream of known-item topics consisting of: (i) manual topics (including a selection of WebCLEF 2005 topics, and a set of new topics) and (ii) automatically generated topics (generated using ...
متن کاملMultilinguales Web Retrieval im Rahmen von WebCLEF 2006
Dieser Beitrag beschreibt Retrievalexperimente mit einem umfangreichen multilingualen Korpus im Rahmen von WebCLEF 2006 an der Univer-sität Hildesheim. Im Vordergrund stand die Nut-zung von HTML Strukturelementen, der Einsatz von Blind Relevance Feedback und die Evaluie-rung des sprachunabhängigen Indexierungsansat-zes.
متن کاملEuroGOV: Engineering a Multilingual Web Corpus
EuroGOV is a multilingual web corpus that was created to serve as the document collection for WebCLEF, the CLEF 2005 web retrieval task. EuroGOV is a collection of web pages crawled from the European Union portal, European Union member state governmental web sites, and Russian government web sites. The corpus contains over 3 million documents written in more than 20 different European languages...
متن کامل